Search CORE

9 research outputs found

Scalability Engineering for Parallel Programs Using Empirical Performance Models

Author: Shudler Sergei
Publication venue
Publication date: 11/06/2018
Field of study

Performance engineering is a fundamental task in high-performance computing (HPC). By definition, HPC applications should strive for maximum performance. As HPC systems grow larger and more complex, the scalability of an application has become of primary concern. Scalability is the ability of an application to show satisfactory performance even when the number of processors or the problems size is increased. Although various analysis techniques for scalability were suggested in past, engineering applications for extreme-scale systems still occurs ad hoc. The challenge is to provide techniques that explicitly target scalability throughout the whole development cycle, thereby allowing developers to uncover bottlenecks earlier in the development process. In this work, we develop a number of fundamental approaches in which we use empirical performance models to gain insights into the code behavior at higher scales. In the first contribution, we propose a new software engineering approach for extreme-scale systems. Specifically, we develop a framework that validates asymptotic scalability expectations of programs against their actual behavior. The most important applications of this method, which is especially well suited for libraries encapsulating well-studied algorithms, include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. We supply a tool-chain that automates large parts of the framework, thus allowing it to be continuously applied throughout the development cycle with very little effort. We evaluate the framework with MPI collective operations, a data-mining code, and various OpenMP constructs. In addition to revealing unexpected scalability bottlenecks, the results also show that it is a viable approach for systematic validation of performance expectations. As the second contribution, we show how the isoefficiency function of a task-based program can be determined empirically and used in practice to control the efficiency. Isoefficiency, a concept borrowed from theoretical algorithm analysis, binds efficiency, core count, and the input size in one analytical expression, thereby allowing the latter two to be adjusted according to given (realistic) efficiency objectives. Moreover, we analyze resource contention by modeling the efficiency of contention-free execution. This allows poor scaling to be attributed either to excessive resource contention overhead or structural conflicts related to task dependencies or scheduling. Our results, obtained with applications from two benchmark suites, demonstrate that our approach provides insights into fundamental scalability limitations or excessive resource overhead and can help answer critical co-design questions. Our contributions for better scalability engineering can be used not only in the traditional software development cycle, but also in other, related fields, such as algorithm engineering. It is a field that uses the software engineering cycle to produce algorithms that can be utilized in applications more easily. Using our contributions, algorithm engineers can make informed design decisions, get better insights, and save experimentation time

tuprints

Recommended from our members

Improving Performance of M-to-N Processing and Data Redistribution in In Transit Analysis and Visualization

Author: Bethel E Wes
Ferrier Nicola
Gu Junmin
Kress James
Logan Jeremey
Loring Burlen
Rizzi Silvio
Shudler Sergei
Wolf Matthew
Publication venue: eScholarship, University of California
Publication date: 25/05/2020
Field of study

In an in transit setting, a parallel data producer, such as a numerical simulation, runs on one set of ranks M, while a data consumer, such as a parallel visualization application, runs on a different set of ranks N. One of the central challenges in this in transit setting is to determine the mapping of data from the set of M producer ranks to the set of N consumer ranks. This is a challenging problem for several reasons, such as the producer and consumer codes potentially having different scaling characteristics and different data models. The resulting mapping from M to N ranks can have a significant impact on aggregate application performance. In this work, we present an approach for performing this M-to-N mapping in a way that has broad applicability across a diversity of data producer and consumer applications. We evaluate its design and performance with a study that runs at high concurrency on a modern HPC platform. By leveraging design characteristics, which facilitate an “intelligent” mapping from M-to-N, we observe significant performance gains are possible in terms of several different metrics, including time-to-solution and amount of data moved

eScholarship - University of California

Exascaling Your Library

Author: Calotoiu Alexandru
Hoefler Torsten
Shudler Sergei
Strube Alexandre
Wolf Felix
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Many libraries in the HPC field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI as an example, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms

Juelich Shared Electronic Resources

ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications

Author: Calotoiu Alexandru
Copik Marcin
Hoefler Torsten
Ritter Marcus
Shudler Sergei
Wolf Felix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Performance models are powerful tools allowing developers to understand the behavior of their applications, and empower them to address performance issues already during the design or prototyping phase. Unfortunately, the difficulties of creating such models manually and the effort involved render performance modeling a topic limited to a relatively small community of experts. This article summarizes the results of the two projects Catwalk, which aimed to create tools that automate key activities of the performance modeling process, and ExtraPeak, which built upon the results of Catwalk and worked toward making this powerful methodology more flexible, streamlined and easy to use. The sew projects both provide accessible tools and methods that bring performance modeling to a wider audience of HPC application developers. Since its outcome represents the final state of the two projects, we expand to a greater extent on the results of ExtraPeak.ISSN:1439-735

Repository for Publications and Research Data

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations

Author: Berens Yannick
Calotoiu Alexandru
Hoefler Torsten
Shudler Sergei
Strube Alexandre
Wolf Felix
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Many libraries in the HPC field use sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm and performance engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine-grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI and parallel sorting algorithms as examples, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms

Juelich Shared Electronic Resources

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations

Author: Alexandre Strube
Alexandru Calotoiu
Felix Wolf
Sergei Shudler
Torsten Hoefler
Yannick Berens
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Improving Performance of M-to-N Processing and Data Redistribution in In Transit Analysis and Visualization

Author: Bethel E Wes
Ferrier Nicola
Gu Junmin
Kress James
Logan Jeremey
Loring Burlen
Rizzi Silvio
Shudler Sergei
Wolf Matthew
Publication venue: eScholarship, University of California
Publication date: 25/05/2020
Field of study

eScholarship - University of California

Automatic Performance Modeling of HPC Applications

Author: Bischof Christian
Calotoiu Alexandru
Hoefler Torsten
Iwainsky Christian
Kwasniewski Grzegorz
Mohr Bernd
Shudler Sergei
Strube Alexandre
Vogel Andreas
Wittum Gabriel
Wolf Felix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many existing applications suffer from inherent scalability limitations that will prevent them from running at exascale. Current tuning practices, which rely on diagnostic experiments, have drawbacks because (i) they detect scalability problems relatively late in the development process when major effort has already been invested into an inadequate solution and (ii) they incur the extra cost of potentially numerous full-scale experiments. Analytical performance models, in contrast, allow application developers to address performance issues already during the design or prototyping phase. Unfortunately, the difficulties of creating such models combined with the lack of appropriate tool support still render performance modeling an esoteric discipline mastered only by a relatively small community of experts. This article summarizes the results of the Catwalk project, which aimed to create tools that automate key activities of the performance modeling process, making this powerful methodology accessible to a wider audience of HPC application developers

Juelich Shared Electronic Resources